73 research outputs found
Experience-driven formation of parts-based representations in a model of layered visual memory
Growing neuropsychological and neurophysiological evidence suggests that the
visual cortex uses parts-based representations to encode, store and retrieve
relevant objects. In such a scheme, objects are represented as a set of
spatially distributed local features, or parts, arranged in stereotypical
fashion. To encode the local appearance and to represent the relations between
the constituent parts, there has to be an appropriate memory structure formed
by previous experience with visual objects. Here, we propose a model how a
hierarchical memory structure supporting efficient storage and rapid recall of
parts-based representations can be established by an experience-driven process
of self-organization. The process is based on the collaboration of slow
bidirectional synaptic plasticity and homeostatic unit activity regulation,
both running at the top of fast activity dynamics with winner-take-all
character modulated by an oscillatory rhythm. These neural mechanisms lay down
the basis for cooperation and competition between the distributed units and
their synaptic connections. Choosing human face recognition as a test task, we
show that, under the condition of open-ended, unsupervised incremental
learning, the system is able to form memory traces for individual faces in a
parts-based fashion. On a lower memory layer the synaptic structure is
developed to represent local facial features and their interrelations, while
the identities of different persons are captured explicitly on a higher layer.
An additional property of the resulting representations is the sparseness of
both the activity during the recall and the synaptic patterns comprising the
memory traces.Comment: 34 pages, 12 Figures, 1 Table, published in Frontiers in
Computational Neuroscience (Special Issue on Complex Systems Science and
Brain Dynamics),
http://www.frontiersin.org/neuroscience/computationalneuroscience/paper/10.3389/neuro.10/015.2009
Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images
Transfer learning aims to exploit pre-trained models for more efficient
follow-up training on wide range of downstream tasks and datasets, enabling
successful training also on small data. Recent line of work posits strong
benefits for model generalization and transfer when model size, data size, and
compute budget are increased for the pre-training. It remains however still
largely unclear whether the observed transfer improvement due to increase in
scale also holds when source and target data distributions are far apart from
each other. In this work we conduct large-scale pre-training on large source
datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images and
compare full and few-shot transfer using different target datasets from both
natural and medical imaging domains. Our observations provide evidence that
while pre-training and transfer on closely related datasets do show clear
benefit of increasing model and data size during pre-training, such benefits
are not clearly visible when source and target datasets are further apart.
These observations hold across both full and few-shot transfer and indicate
that scaling laws pointing to improvement of generalization and transfer with
increasing model and data size are incomplete and should be revised by taking
into account the type and proximity of the source and target data, to correctly
predict the effect of model and data scale during pre-training on transfer.
Remarkably, in full shot transfer to a large X-Ray chest imaging target
(PadChest), the largest model pre-trained on ImageNet-21k slightly outperforms
best models pre-trained on large X-Ray chest imaging data. This indicates
possibility to obtain high quality models for domain-specific transfer even
without access to large domain-specific data, by pre-training instead on
comparably very large, generic source data.Comment: Preprint. Under revie
Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning
The Obstacle Tower Challenge is the task to master a procedurally generated
chain of levels that subsequently get harder to complete. Whereas the most top
performing entries of last year's competition used human demonstrations or
reward shaping to learn how to cope with the challenge, we present an approach
that performed competitively (placed 7th) but starts completely from scratch by
means of Deep Reinforcement Learning with a relatively simple feed-forward deep
network structure. We especially look at the generalization performance of the
taken approach concerning different seeds and various visual themes that have
become available after the competition, and investigate where the agent fails
and why. Note that our approach does not possess a short-term memory like
employing recurrent hidden states. With this work, we hope to contribute to a
better understanding of what is possible with a relatively simple, flexible
solution that can be applied to learning in environments featuring complex 3D
visual input where the abstract task structure itself is still fairly simple.Comment: 8 pages, 9 figures, 2 tables, under revie
DataComp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as
Stable Diffusion and GPT-4, yet their design does not receive the same research
attention as model architectures or training algorithms. To address this
shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset
experiments centered around a new candidate pool of 12.8 billion image-text
pairs from Common Crawl. Participants in our benchmark design new filtering
techniques or curate new data sources and then evaluate their new dataset by
running our standardized CLIP training code and testing the resulting model on
38 downstream test sets. Our benchmark consists of multiple compute scales
spanning four orders of magnitude, which enables the study of scaling trends
and makes the benchmark accessible to researchers with varying resources. Our
baseline experiments show that the DataComp workflow leads to better training
sets. In particular, our best baseline, DataComp-1B, enables training a CLIP
ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming
OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training
procedure and compute. We release DataComp and all accompanying code at
www.datacomp.ai
- …